Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Compile compatibility #789

Closed
wants to merge 45 commits into from
Closed

[WIP] Compile compatibility #789

wants to merge 45 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 24, 2024

The goal of this PR is to make tensordict compatible with torch.compile without any regression.
We currently reached a good coverage, although tensorclasses and functional calls need some more support.

Enabling torch.compile will achieve two goals:

  • Make tensordict feature exportable (for deployment on hardware for instance)
  • Get higher throughput

Speedups (local)

Test Name Mean Std Dev
test_compile_add_one_flat[dict-compile] 33.5000 7.18
test_compile_add_one_flat[dict-eager] 59.2500 12.70
test_compile_add_one_flat[tensordict-compile] 32.5830 6.98
test_compile_add_one_flat[tensordict-eager] 61.1670 13.11
test_compile_add_one_nested[dict-compile] 16.1670 3.46
test_compile_add_one_nested[dict-eager] 35.2910 7.56
test_compile_add_one_nested[tensordict-compile] 21.2080 4.55
test_compile_add_one_nested[tensordict-eager] 49.8340 10.68
test_compile_assign_and_add[dict-compile] 81.0410 17.37
test_compile_assign_and_add[dict-eager] 222.459 47.68
test_compile_assign_and_add[tensordict-compile] 83.6660 17.93
test_compile_assign_and_add[tensordict-eager] 353.083 75.67
test_compile_assign_and_add_stack[compile] 309.958 66.43
test_compile_assign_and_add_stack[eager] 645.917 138.43
test_compile_copy_flat[dict-compile] 26.2500 5.63
test_compile_copy_flat[dict-eager] 22.4580 4.81
test_compile_copy_flat[tensordict-compile] 14.0000 3.00
test_compile_copy_flat[tensordict-eager] 20.6660 4.43
test_compile_copy_nested[dict-compile] 24.5830 5.27
test_compile_copy_nested[dict-eager] 21.5410 4.62
test_compile_copy_nested[tensordict-compile] 10.8330 2.32
test_compile_copy_nested[tensordict-eager] 17.1250 3.67
test_compile_indexing[int-dict-compile] 14.375 3.08
test_compile_indexing[int-dict-eager] 5.2920 1.13
test_compile_indexing[int-tensordict-compile] 14.292 3.06
test_compile_indexing[int-tensordict-eager] 4.8330 1.04
test_compile_indexing[slice-dict-compile] 14.500 3.11
test_compile_indexing[slice-dict-eager] 5.5840 1.20
test_compile_indexing[slice-tensordict-compile] 14.667 3.14
test_compile_indexing[slice-tensordict-eager] 4.6660 1.0
test_compile_indexing[tensor-dict-compile] 7.1250 1.53
test_compile_indexing[tensor-dict-eager] 6.3330 1.36
test_compile_indexing[tensor-tensordict-compile] 7.1660 1.85
test_compile_indexing[tensor-tensordict-eager] 6.9580 0.75

Compatible features:

  • get
  • set
  • getitem
  • TensorDictModule
  • TensorDictSequential
  • clone / copy
  • arithmetic ops
  • reshape

WIP (compatible through tricks in compile / td or require code adaptation)

  • functional calls
  • __torch_function__ breaks: stack, cat need to be executed through TensorDict.stack

Not compatible

  • tensorclass

Related PRs / Issues

# __enter__ populates the module, __exit__ swaps the params back and possibly updates the `params` object
with params.to_module(module):
    output = module(input)

cc @jsuarez5341 @janblumenkamp @btx0424 @soumith @ezyang @matteobettini @albertbou92 @BY571 @Miffyli @teopir @nairbv @luisenp

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 24, 2024
Copy link

github-actions bot commented May 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}56$. Worsened: $\large\color{#d91a1a}28$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.2330μs 19.6395μs 50.9177 KOps/s 56.3278 KOps/s $\textbf{\color{#d91a1a}-9.60\%}$
test_plain_set_stack_nested 49.6630μs 19.8642μs 50.3419 KOps/s 56.3207 KOps/s $\textbf{\color{#d91a1a}-10.62\%}$
test_plain_set_nested_inplace 50.0140μs 20.9289μs 47.7808 KOps/s 50.5279 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_plain_set_stack_nested_inplace 55.3030μs 20.6476μs 48.4317 KOps/s 50.2862 KOps/s $\color{#d91a1a}-3.69\%$
test_items 23.6140μs 2.6180μs 381.9779 KOps/s 378.3066 KOps/s $\color{#35bf28}+0.97\%$
test_items_nested 2.2167ms 1.0062ms 993.8427 Ops/s 3.6951 KOps/s $\textbf{\color{#d91a1a}-73.10\%}$
test_items_nested_locked 1.6765ms 0.9923ms 1.0078 KOps/s 3.7577 KOps/s $\textbf{\color{#d91a1a}-73.18\%}$
test_items_nested_leaf 0.1239ms 78.7698μs 12.6952 KOps/s 12.8590 KOps/s $\color{#d91a1a}-1.27\%$
test_items_stack_nested 1.6257ms 1.0024ms 997.5718 Ops/s 3.6990 KOps/s $\textbf{\color{#d91a1a}-73.03\%}$
test_items_stack_nested_leaf 0.1533ms 79.0894μs 12.6439 KOps/s 13.2799 KOps/s $\color{#d91a1a}-4.79\%$
test_items_stack_nested_locked 1.6770ms 0.9988ms 1.0013 KOps/s 3.7057 KOps/s $\textbf{\color{#d91a1a}-72.98\%}$
test_keys 6.9790μs 1.0835μs 922.9771 KOps/s 254.6147 KOps/s $\textbf{\color{#35bf28}+262.50\%}$
test_keys_nested 4.2549ms 0.1412ms 7.0845 KOps/s 7.3408 KOps/s $\color{#d91a1a}-3.49\%$
test_keys_nested_locked 0.3100ms 0.1464ms 6.8307 KOps/s 7.0739 KOps/s $\color{#d91a1a}-3.44\%$
test_keys_nested_leaf 0.2269ms 0.1189ms 8.4070 KOps/s 8.5202 KOps/s $\color{#d91a1a}-1.33\%$
test_keys_stack_nested 0.2408ms 0.1402ms 7.1329 KOps/s 7.3895 KOps/s $\color{#d91a1a}-3.47\%$
test_keys_stack_nested_leaf 0.1960ms 0.1200ms 8.3363 KOps/s 8.6799 KOps/s $\color{#d91a1a}-3.96\%$
test_keys_stack_nested_locked 0.2194ms 0.1467ms 6.8181 KOps/s 7.1884 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_values 11.6842μs 1.2144μs 823.4496 KOps/s 855.0884 KOps/s $\color{#d91a1a}-3.70\%$
test_values_nested 78.9470μs 39.8091μs 25.1199 KOps/s 20.1042 KOps/s $\textbf{\color{#35bf28}+24.95\%}$
test_values_nested_locked 77.6260μs 39.3189μs 25.4331 KOps/s 19.9375 KOps/s $\textbf{\color{#35bf28}+27.56\%}$
test_values_nested_leaf 92.7340μs 34.7116μs 28.8088 KOps/s 22.1588 KOps/s $\textbf{\color{#35bf28}+30.01\%}$
test_values_stack_nested 95.5390μs 40.1319μs 24.9178 KOps/s 19.6181 KOps/s $\textbf{\color{#35bf28}+27.01\%}$
test_values_stack_nested_leaf 79.2780μs 35.0474μs 28.5328 KOps/s 22.5054 KOps/s $\textbf{\color{#35bf28}+26.78\%}$
test_values_stack_nested_locked 83.9770μs 40.1158μs 24.9278 KOps/s 19.5471 KOps/s $\textbf{\color{#35bf28}+27.53\%}$
test_membership 2.3449μs 0.2661μs 3.7577 MOps/s 741.3191 KOps/s $\textbf{\color{#35bf28}+406.90\%}$
test_membership_nested 0.9633ms 2.9507μs 338.9033 KOps/s 290.3204 KOps/s $\textbf{\color{#35bf28}+16.73\%}$
test_membership_nested_leaf 43.5420μs 2.8804μs 347.1769 KOps/s 289.1807 KOps/s $\textbf{\color{#35bf28}+20.06\%}$
test_membership_stacked_nested 25.5580μs 2.8966μs 345.2292 KOps/s 268.1196 KOps/s $\textbf{\color{#35bf28}+28.76\%}$
test_membership_stacked_nested_leaf 24.3150μs 2.9047μs 344.2723 KOps/s 285.2066 KOps/s $\textbf{\color{#35bf28}+20.71\%}$
test_membership_nested_last 41.8690μs 3.8195μs 261.8136 KOps/s 241.7426 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_membership_nested_leaf_last 35.0260μs 3.8317μs 260.9830 KOps/s 244.2073 KOps/s $\textbf{\color{#35bf28}+6.87\%}$
test_membership_stacked_nested_last 48.4910μs 5.7575μs 173.6876 KOps/s 75.9033 KOps/s $\textbf{\color{#35bf28}+128.83\%}$
test_membership_stacked_nested_leaf_last 38.4020μs 5.8479μs 171.0014 KOps/s 75.1052 KOps/s $\textbf{\color{#35bf28}+127.68\%}$
test_nested_getleaf 53.9110μs 13.6463μs 73.2797 KOps/s 93.6077 KOps/s $\textbf{\color{#d91a1a}-21.72\%}$
test_nested_get 41.9380μs 13.0242μs 76.7802 KOps/s 100.3202 KOps/s $\textbf{\color{#d91a1a}-23.46\%}$
test_stacked_getleaf 53.8900μs 13.5081μs 74.0296 KOps/s 92.6363 KOps/s $\textbf{\color{#d91a1a}-20.09\%}$
test_stacked_get 57.6070μs 12.9361μs 77.3029 KOps/s 99.3819 KOps/s $\textbf{\color{#d91a1a}-22.22\%}$
test_nested_getitemleaf 53.4500μs 14.2245μs 70.3014 KOps/s 89.0389 KOps/s $\textbf{\color{#d91a1a}-21.04\%}$
test_nested_getitem 57.0970μs 13.0898μs 76.3954 KOps/s 97.8701 KOps/s $\textbf{\color{#d91a1a}-21.94\%}$
test_stacked_getitemleaf 57.9880μs 14.0932μs 70.9564 KOps/s 89.7029 KOps/s $\textbf{\color{#d91a1a}-20.90\%}$
test_stacked_getitem 54.3010μs 13.1206μs 76.2162 KOps/s 99.0638 KOps/s $\textbf{\color{#d91a1a}-23.06\%}$
test_lock_nested 1.8996ms 0.3640ms 2.7469 KOps/s 2.7825 KOps/s $\color{#d91a1a}-1.28\%$
test_lock_stack_nested 0.4534ms 0.3172ms 3.1521 KOps/s 3.3073 KOps/s $\color{#d91a1a}-4.69\%$
test_unlock_nested 86.0342ms 0.4548ms 2.1987 KOps/s 2.3400 KOps/s $\textbf{\color{#d91a1a}-6.04\%}$
test_unlock_stack_nested 0.4527ms 0.3249ms 3.0775 KOps/s 3.2287 KOps/s $\color{#d91a1a}-4.68\%$
test_flatten_speed 0.2451ms 95.4212μs 10.4798 KOps/s 10.4381 KOps/s $\color{#35bf28}+0.40\%$
test_unflatten_speed 0.7815ms 0.4433ms 2.2557 KOps/s 2.4600 KOps/s $\textbf{\color{#d91a1a}-8.30\%}$
test_common_ops 4.4417ms 0.7048ms 1.4188 KOps/s 1.3322 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_creation 87.4130μs 1.8934μs 528.1584 KOps/s 512.9202 KOps/s $\color{#35bf28}+2.97\%$
test_creation_empty 29.2040μs 9.6150μs 104.0044 KOps/s 84.0416 KOps/s $\textbf{\color{#35bf28}+23.75\%}$
test_creation_nested_1 40.2550μs 13.0867μs 76.4136 KOps/s 68.0673 KOps/s $\textbf{\color{#35bf28}+12.26\%}$
test_creation_nested_2 49.7630μs 15.8172μs 63.2222 KOps/s 55.1494 KOps/s $\textbf{\color{#35bf28}+14.64\%}$
test_clone 0.2580ms 13.2261μs 75.6082 KOps/s 71.6745 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_getitem[int] 35.5060μs 11.4884μs 87.0442 KOps/s 86.5559 KOps/s $\color{#35bf28}+0.56\%$
test_getitem[slice_int] 77.7260μs 24.2233μs 41.2825 KOps/s 43.7737 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_getitem[range] 95.5080μs 44.9774μs 22.2334 KOps/s 15.8685 KOps/s $\textbf{\color{#35bf28}+40.11\%}$
test_getitem[tuple] 65.8730μs 19.0115μs 52.5998 KOps/s 51.9421 KOps/s $\color{#35bf28}+1.27\%$
test_getitem[list] 0.1162ms 40.2100μs 24.8694 KOps/s 23.3721 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_setitem_dim[int] 49.6630μs 29.5403μs 33.8521 KOps/s 26.5246 KOps/s $\textbf{\color{#35bf28}+27.63\%}$
test_setitem_dim[slice_int] 96.1910μs 57.4017μs 17.4211 KOps/s 15.2924 KOps/s $\textbf{\color{#35bf28}+13.92\%}$
test_setitem_dim[range] 0.1533ms 79.0319μs 12.6531 KOps/s 11.2082 KOps/s $\textbf{\color{#35bf28}+12.89\%}$
test_setitem_dim[tuple] 91.2500μs 45.5052μs 21.9755 KOps/s 18.6747 KOps/s $\textbf{\color{#35bf28}+17.68\%}$
test_setitem 0.2309ms 18.8329μs 53.0985 KOps/s 47.1920 KOps/s $\textbf{\color{#35bf28}+12.52\%}$
test_set 0.2309ms 18.5304μs 53.9655 KOps/s 48.2493 KOps/s $\textbf{\color{#35bf28}+11.85\%}$
test_set_shared 4.4468ms 0.1459ms 6.8529 KOps/s 6.4714 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_update 0.2465ms 20.3976μs 49.0254 KOps/s 43.1188 KOps/s $\textbf{\color{#35bf28}+13.70\%}$
test_update_nested 0.2736ms 27.9278μs 35.8066 KOps/s 32.0033 KOps/s $\textbf{\color{#35bf28}+11.88\%}$
test_update__nested 0.2124ms 25.3946μs 39.3784 KOps/s 39.4136 KOps/s $\color{#d91a1a}-0.09\%$
test_set_nested 0.2024ms 21.7490μs 45.9791 KOps/s 43.9382 KOps/s $\color{#35bf28}+4.64\%$
test_set_nested_new 0.8492ms 25.4490μs 39.2943 KOps/s 37.3543 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_select 0.2296ms 42.3831μs 23.5943 KOps/s 24.1775 KOps/s $\color{#d91a1a}-2.41\%$
test_select_nested 0.2969ms 61.6501μs 16.2206 KOps/s 16.3932 KOps/s $\color{#d91a1a}-1.05\%$
test_exclude_nested 0.1454ms 74.4531μs 13.4313 KOps/s 8.2447 KOps/s $\textbf{\color{#35bf28}+62.91\%}$
test_empty[True] 0.5112ms 0.2834ms 3.5280 KOps/s 2.5247 KOps/s $\textbf{\color{#35bf28}+39.74\%}$
test_empty[False] 12.1603μs 1.1071μs 903.2525 KOps/s 906.3382 KOps/s $\color{#d91a1a}-0.34\%$
test_unbind_speed 0.3023ms 0.2575ms 3.8836 KOps/s 3.8754 KOps/s $\color{#35bf28}+0.21\%$
test_unbind_speed_stack0 0.3647ms 0.2547ms 3.9263 KOps/s 3.9612 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_speed_stack1 88.8440ms 0.7019ms 1.4247 KOps/s 1.3034 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_split 86.2926ms 1.6102ms 621.0411 Ops/s 609.1152 Ops/s $\color{#35bf28}+1.96\%$
test_chunk 87.6960ms 1.6159ms 618.8487 Ops/s 659.3399 Ops/s $\textbf{\color{#d91a1a}-6.14\%}$
test_creation[device0] 0.2307ms 86.3018μs 11.5872 KOps/s 11.6497 KOps/s $\color{#d91a1a}-0.54\%$
test_creation_from_tensor 0.2538ms 86.8008μs 11.5206 KOps/s 11.3516 KOps/s $\color{#35bf28}+1.49\%$
test_add_one[memmap_tensor0] 0.7411ms 5.4633μs 183.0395 KOps/s 185.5124 KOps/s $\color{#d91a1a}-1.33\%$
test_contiguous[memmap_tensor0] 13.6950μs 0.6326μs 1.5808 MOps/s 1.5894 MOps/s $\color{#d91a1a}-0.54\%$
test_stack[memmap_tensor0] 55.8940μs 3.5632μs 280.6445 KOps/s 278.8074 KOps/s $\color{#35bf28}+0.66\%$
test_memmaptd_index 1.1647ms 0.2573ms 3.8866 KOps/s 3.8885 KOps/s $\color{#d91a1a}-0.05\%$
test_memmaptd_index_astensor 1.1103ms 0.3444ms 2.9035 KOps/s 2.5051 KOps/s $\textbf{\color{#35bf28}+15.90\%}$
test_memmaptd_index_op 1.2659ms 0.6119ms 1.6342 KOps/s 1.5151 KOps/s $\textbf{\color{#35bf28}+7.86\%}$
test_serialize_model 0.2025s 0.1213s 8.2427 Ops/s 8.1973 Ops/s $\color{#35bf28}+0.55\%$
test_serialize_model_pickle 0.4520s 0.3817s 2.6198 Ops/s 2.6144 Ops/s $\color{#35bf28}+0.21\%$
test_serialize_weights 0.2014s 0.1181s 8.4690 Ops/s 8.3725 Ops/s $\color{#35bf28}+1.15\%$
test_serialize_weights_returnearly 0.2139s 0.1412s 7.0819 Ops/s 7.6530 Ops/s $\textbf{\color{#d91a1a}-7.46\%}$
test_serialize_weights_pickle 1.2470s 0.6096s 1.6404 Ops/s 2.4440 Ops/s $\textbf{\color{#d91a1a}-32.88\%}$
test_serialize_weights_filesystem 0.1005s 93.6594ms 10.6770 Ops/s 9.6574 Ops/s $\textbf{\color{#35bf28}+10.56\%}$
test_serialize_model_filesystem 0.1032s 96.2173ms 10.3931 Ops/s 9.7874 Ops/s $\textbf{\color{#35bf28}+6.19\%}$
test_reshape_pytree 62.2370μs 25.0278μs 39.9556 KOps/s 38.5395 KOps/s $\color{#35bf28}+3.67\%$
test_reshape_td 74.5400μs 33.2656μs 30.0611 KOps/s 30.4101 KOps/s $\color{#d91a1a}-1.15\%$
test_view_pytree 65.5820μs 25.0151μs 39.9758 KOps/s 39.7065 KOps/s $\color{#35bf28}+0.68\%$
test_view_td 84.2970μs 37.0670μs 26.9782 KOps/s 27.0001 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_pytree 72.7660μs 29.2093μs 34.2357 KOps/s 34.4291 KOps/s $\color{#d91a1a}-0.56\%$
test_unbind_td 0.4224ms 38.6777μs 25.8547 KOps/s 26.0576 KOps/s $\color{#d91a1a}-0.78\%$
test_split_pytree 74.9910μs 29.2055μs 34.2401 KOps/s 34.4534 KOps/s $\color{#d91a1a}-0.62\%$
test_split_td 0.5562ms 40.8798μs 24.4620 KOps/s 24.4830 KOps/s $\color{#d91a1a}-0.09\%$
test_add_pytree 89.3170μs 34.3633μs 29.1008 KOps/s 28.1000 KOps/s $\color{#35bf28}+3.56\%$
test_add_td 0.1821ms 56.6024μs 17.6671 KOps/s 16.5113 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_distributed 0.2808ms 0.1037ms 9.6466 KOps/s 9.5532 KOps/s $\color{#35bf28}+0.98\%$
test_tdmodule 41.4980μs 15.3913μs 64.9717 KOps/s 51.2776 KOps/s $\textbf{\color{#35bf28}+26.71\%}$
test_tdmodule_dispatch 72.3250μs 31.7916μs 31.4548 KOps/s 26.2583 KOps/s $\textbf{\color{#35bf28}+19.79\%}$
test_tdseq 44.4940μs 16.7557μs 59.6812 KOps/s 45.9899 KOps/s $\textbf{\color{#35bf28}+29.77\%}$
test_tdseq_dispatch 0.1146ms 36.6190μs 27.3082 KOps/s 23.7610 KOps/s $\textbf{\color{#35bf28}+14.93\%}$
test_instantiation_functorch 89.0834ms 1.4957ms 668.5699 Ops/s 745.2147 Ops/s $\textbf{\color{#d91a1a}-10.28\%}$
test_instantiation_td 1.7813ms 1.0715ms 933.2562 Ops/s 959.0876 Ops/s $\color{#d91a1a}-2.69\%$
test_exec_functorch 0.3578ms 0.1650ms 6.0594 KOps/s 6.1246 KOps/s $\color{#d91a1a}-1.07\%$
test_exec_functional_call 0.3149ms 0.1506ms 6.6380 KOps/s 6.6391 KOps/s $\color{#d91a1a}-0.02\%$
test_exec_td 0.2358ms 0.1458ms 6.8600 KOps/s 6.6511 KOps/s $\color{#35bf28}+3.14\%$
test_exec_td_decorator 0.9370ms 0.2372ms 4.2164 KOps/s 4.5175 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_vmap_mlp_speed[True-True] 0.8070ms 0.4864ms 2.0559 KOps/s 2.0142 KOps/s $\color{#35bf28}+2.07\%$
test_vmap_mlp_speed[True-False] 1.4318ms 0.4866ms 2.0552 KOps/s 2.0185 KOps/s $\color{#35bf28}+1.82\%$
test_vmap_mlp_speed[False-True] 0.7875ms 0.4109ms 2.4338 KOps/s 2.5000 KOps/s $\color{#d91a1a}-2.65\%$
test_vmap_mlp_speed[False-False] 0.7459ms 0.3984ms 2.5098 KOps/s 2.5018 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[True-True] 3.7264ms 0.5696ms 1.7556 KOps/s 1.7716 KOps/s $\color{#d91a1a}-0.90\%$
test_vmap_mlp_speed_decorator[True-False] 0.8161ms 0.5668ms 1.7642 KOps/s 1.7664 KOps/s $\color{#d91a1a}-0.12\%$
test_vmap_mlp_speed_decorator[False-True] 1.0138ms 0.4734ms 2.1122 KOps/s 2.1557 KOps/s $\color{#d91a1a}-2.02\%$
test_vmap_mlp_speed_decorator[False-False] 0.7826ms 0.4753ms 2.1038 KOps/s 2.1506 KOps/s $\color{#d91a1a}-2.18\%$
test_to_module_speed[True] 2.4661ms 1.8730ms 533.9168 Ops/s 591.8877 Ops/s $\textbf{\color{#d91a1a}-9.79\%}$
test_to_module_speed[False] 2.7274ms 1.8168ms 550.4183 Ops/s 606.9606 Ops/s $\textbf{\color{#d91a1a}-9.32\%}$
test_tc_init 53.3400μs 23.9986μs 41.6690 KOps/s 14.6495 KOps/s $\textbf{\color{#35bf28}+184.44\%}$
test_tc_init_nested 0.1145ms 49.3863μs 20.2485 KOps/s 7.1723 KOps/s $\textbf{\color{#35bf28}+182.31\%}$
test_tc_first_layer_tensor 47.2380μs 1.3390μs 746.8419 KOps/s 167.3070 KOps/s $\textbf{\color{#35bf28}+346.39\%}$
test_tc_first_layer_nontensor 22.4020μs 1.3551μs 737.9591 KOps/s 167.1346 KOps/s $\textbf{\color{#35bf28}+341.54\%}$
test_tc_second_layer_tensor 35.6970μs 1.6018μs 624.3163 KOps/s 87.3421 KOps/s $\textbf{\color{#35bf28}+614.79\%}$
test_tc_second_layer_nontensor 39.9150μs 2.0329μs 491.9056 KOps/s 86.3120 KOps/s $\textbf{\color{#35bf28}+469.92\%}$
test_unbind 0.1054s 6.8174ms 146.6837 Ops/s 92.3175 Ops/s $\textbf{\color{#35bf28}+58.89\%}$
test_full_like 17.1318ms 11.5663ms 86.4582 Ops/s 82.8618 Ops/s $\color{#35bf28}+4.34\%$
test_zeros_like 11.2756ms 6.0729ms 164.6654 Ops/s 169.2976 Ops/s $\color{#d91a1a}-2.74\%$
test_ones_like 12.1817ms 6.5507ms 152.6544 Ops/s 171.4725 Ops/s $\textbf{\color{#d91a1a}-10.97\%}$
test_clone 13.0328ms 8.6778ms 115.2361 Ops/s 125.2599 Ops/s $\textbf{\color{#d91a1a}-8.00\%}$
test_squeeze 0.2091ms 12.6299μs 79.1772 KOps/s 35.9712 KOps/s $\textbf{\color{#35bf28}+120.11\%}$
test_unsqueeze 0.2059ms 65.2522μs 15.3252 KOps/s 10.0471 KOps/s $\textbf{\color{#35bf28}+52.53\%}$
test_split 0.3634ms 0.1080ms 9.2594 KOps/s 6.0462 KOps/s $\textbf{\color{#35bf28}+53.14\%}$
test_permute 0.3979ms 0.1309ms 7.6407 KOps/s 5.5597 KOps/s $\textbf{\color{#35bf28}+37.43\%}$
test_stack 30.5953ms 24.2135ms 41.2994 Ops/s 40.8224 Ops/s $\color{#35bf28}+1.17\%$
test_cat 33.6165ms 24.5304ms 40.7658 Ops/s 40.5558 Ops/s $\color{#35bf28}+0.52\%$

Copy link

github-actions bot commented May 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}37$. Worsened: $\large\color{#d91a1a}36$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5120ms 16.5439μs 60.4453 KOps/s 85.4082 KOps/s $\textbf{\color{#d91a1a}-29.23\%}$
test_plain_set_stack_nested 36.6000μs 16.6309μs 60.1289 KOps/s 85.4786 KOps/s $\textbf{\color{#d91a1a}-29.66\%}$
test_plain_set_nested_inplace 47.2210μs 17.0188μs 58.7586 KOps/s 77.3594 KOps/s $\textbf{\color{#d91a1a}-24.04\%}$
test_plain_set_stack_nested_inplace 37.9210μs 17.1269μs 58.3876 KOps/s 76.5674 KOps/s $\textbf{\color{#d91a1a}-23.74\%}$
test_items 21.4000μs 4.6021μs 217.2927 KOps/s 211.2790 KOps/s $\color{#35bf28}+2.85\%$
test_items_nested 1.8565ms 0.9813ms 1.0190 KOps/s 2.9533 KOps/s $\textbf{\color{#d91a1a}-65.49\%}$
test_items_nested_locked 1.0520ms 0.9948ms 1.0052 KOps/s 2.9313 KOps/s $\textbf{\color{#d91a1a}-65.71\%}$
test_items_nested_leaf 0.1095ms 84.1847μs 11.8786 KOps/s 12.0914 KOps/s $\color{#d91a1a}-1.76\%$
test_items_stack_nested 1.0593ms 0.9790ms 1.0215 KOps/s 2.9374 KOps/s $\textbf{\color{#d91a1a}-65.22\%}$
test_items_stack_nested_leaf 0.1106ms 88.0183μs 11.3613 KOps/s 11.9440 KOps/s $\color{#d91a1a}-4.88\%$
test_items_stack_nested_locked 0.9979ms 0.9799ms 1.0206 KOps/s 2.9182 KOps/s $\textbf{\color{#d91a1a}-65.03\%}$
test_keys 6.8700μs 1.8170μs 550.3711 KOps/s 214.0092 KOps/s $\textbf{\color{#35bf28}+157.17\%}$
test_keys_nested 91.8420μs 65.9254μs 15.1687 KOps/s 14.7820 KOps/s $\color{#35bf28}+2.62\%$
test_keys_nested_locked 89.7630μs 70.5922μs 14.1659 KOps/s 13.8596 KOps/s $\color{#35bf28}+2.21\%$
test_keys_nested_leaf 81.0520μs 56.6082μs 17.6653 KOps/s 17.2202 KOps/s $\color{#35bf28}+2.58\%$
test_keys_stack_nested 84.5020μs 65.4832μs 15.2711 KOps/s 15.0067 KOps/s $\color{#35bf28}+1.76\%$
test_keys_stack_nested_leaf 81.5810μs 56.1914μs 17.7963 KOps/s 17.3739 KOps/s $\color{#35bf28}+2.43\%$
test_keys_stack_nested_locked 0.1153ms 70.2529μs 14.2343 KOps/s 14.0083 KOps/s $\color{#35bf28}+1.61\%$
test_values 9.1270μs 1.7761μs 563.0336 KOps/s 546.5834 KOps/s $\color{#35bf28}+3.01\%$
test_values_nested 41.1710μs 25.7379μs 38.8532 KOps/s 28.4864 KOps/s $\textbf{\color{#35bf28}+36.39\%}$
test_values_nested_locked 44.3010μs 27.5811μs 36.2567 KOps/s 27.0964 KOps/s $\textbf{\color{#35bf28}+33.81\%}$
test_values_nested_leaf 88.8310μs 21.8252μs 45.8187 KOps/s 31.9156 KOps/s $\textbf{\color{#35bf28}+43.56\%}$
test_values_stack_nested 45.4710μs 26.0624μs 38.3695 KOps/s 27.6151 KOps/s $\textbf{\color{#35bf28}+38.94\%}$
test_values_stack_nested_leaf 45.1710μs 22.1288μs 45.1900 KOps/s 31.1179 KOps/s $\textbf{\color{#35bf28}+45.22\%}$
test_values_stack_nested_locked 45.6310μs 28.1055μs 35.5803 KOps/s 26.5295 KOps/s $\textbf{\color{#35bf28}+34.12\%}$
test_membership 0.8217μs 0.1938μs 5.1606 MOps/s 1.3849 MOps/s $\textbf{\color{#35bf28}+272.63\%}$
test_membership_nested 0.9971ms 2.2112μs 452.2437 KOps/s 389.8364 KOps/s $\textbf{\color{#35bf28}+16.01\%}$
test_membership_nested_leaf 10.9950μs 2.1119μs 473.5068 KOps/s 389.1113 KOps/s $\textbf{\color{#35bf28}+21.69\%}$
test_membership_stacked_nested 30.5200μs 2.1719μs 460.4211 KOps/s 383.0130 KOps/s $\textbf{\color{#35bf28}+20.21\%}$
test_membership_stacked_nested_leaf 17.3700μs 2.1414μs 466.9820 KOps/s 392.0930 KOps/s $\textbf{\color{#35bf28}+19.10\%}$
test_membership_nested_last 21.4100μs 2.7985μs 357.3371 KOps/s 321.7233 KOps/s $\textbf{\color{#35bf28}+11.07\%}$
test_membership_nested_leaf_last 20.4410μs 2.8329μs 352.9998 KOps/s 325.2331 KOps/s $\textbf{\color{#35bf28}+8.54\%}$
test_membership_stacked_nested_last 39.4810μs 11.0739μs 90.3023 KOps/s 101.8798 KOps/s $\textbf{\color{#d91a1a}-11.36\%}$
test_membership_stacked_nested_leaf_last 86.6420μs 11.0413μs 90.5690 KOps/s 101.9397 KOps/s $\textbf{\color{#d91a1a}-11.15\%}$
test_nested_getleaf 26.6410μs 11.1623μs 89.5873 KOps/s 120.3027 KOps/s $\textbf{\color{#d91a1a}-25.53\%}$
test_nested_get 31.9310μs 10.5475μs 94.8089 KOps/s 127.7897 KOps/s $\textbf{\color{#d91a1a}-25.81\%}$
test_stacked_getleaf 27.2800μs 11.1383μs 89.7803 KOps/s 119.0125 KOps/s $\textbf{\color{#d91a1a}-24.56\%}$
test_stacked_get 28.0700μs 10.5547μs 94.7447 KOps/s 127.0954 KOps/s $\textbf{\color{#d91a1a}-25.45\%}$
test_nested_getitemleaf 28.9310μs 11.3491μs 88.1129 KOps/s 118.1388 KOps/s $\textbf{\color{#d91a1a}-25.42\%}$
test_nested_getitem 25.1700μs 10.7171μs 93.3085 KOps/s 125.0526 KOps/s $\textbf{\color{#d91a1a}-25.38\%}$
test_stacked_getitemleaf 27.1710μs 11.2939μs 88.5433 KOps/s 117.0916 KOps/s $\textbf{\color{#d91a1a}-24.38\%}$
test_stacked_getitem 27.5910μs 10.6897μs 93.5479 KOps/s 125.2583 KOps/s $\textbf{\color{#d91a1a}-25.32\%}$
test_lock_nested 84.3050ms 0.4391ms 2.2776 KOps/s 2.4532 KOps/s $\textbf{\color{#d91a1a}-7.16\%}$
test_lock_stack_nested 0.3321ms 0.3018ms 3.3131 KOps/s 3.3581 KOps/s $\color{#d91a1a}-1.34\%$
test_unlock_nested 0.7369ms 0.3537ms 2.8276 KOps/s 2.8505 KOps/s $\color{#d91a1a}-0.80\%$
test_unlock_stack_nested 0.3403ms 0.3107ms 3.2180 KOps/s 3.2669 KOps/s $\color{#d91a1a}-1.50\%$
test_flatten_speed 0.1905ms 0.1032ms 9.6919 KOps/s 9.7813 KOps/s $\color{#d91a1a}-0.91\%$
test_unflatten_speed 0.4454ms 0.3074ms 3.2534 KOps/s 3.4522 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_common_ops 0.8915ms 0.5758ms 1.7368 KOps/s 1.8446 KOps/s $\textbf{\color{#d91a1a}-5.84\%}$
test_creation 31.1800μs 1.5210μs 657.4745 KOps/s 618.8354 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_creation_empty 26.3200μs 9.7380μs 102.6906 KOps/s 150.0572 KOps/s $\textbf{\color{#d91a1a}-31.57\%}$
test_creation_nested_1 31.5700μs 12.0299μs 83.1264 KOps/s 118.4040 KOps/s $\textbf{\color{#d91a1a}-29.79\%}$
test_creation_nested_2 40.1300μs 13.3170μs 75.0920 KOps/s 93.3811 KOps/s $\textbf{\color{#d91a1a}-19.59\%}$
test_clone 72.9320μs 11.4990μs 86.9642 KOps/s 85.9959 KOps/s $\color{#35bf28}+1.13\%$
test_getitem[int] 32.9310μs 10.5140μs 95.1113 KOps/s 91.8083 KOps/s $\color{#35bf28}+3.60\%$
test_getitem[slice_int] 42.4110μs 21.3795μs 46.7739 KOps/s 49.1679 KOps/s $\color{#d91a1a}-4.87\%$
test_getitem[range] 0.1712ms 35.7877μs 27.9426 KOps/s 21.9333 KOps/s $\textbf{\color{#35bf28}+27.40\%}$
test_getitem[tuple] 37.7010μs 17.7869μs 56.2210 KOps/s 54.8434 KOps/s $\color{#35bf28}+2.51\%$
test_getitem[list] 0.1863ms 31.0698μs 32.1856 KOps/s 31.9824 KOps/s $\color{#35bf28}+0.64\%$
test_setitem_dim[int] 42.9910μs 26.6080μs 37.5826 KOps/s 38.7587 KOps/s $\color{#d91a1a}-3.03\%$
test_setitem_dim[slice_int] 82.9220μs 48.3387μs 20.6874 KOps/s 21.6153 KOps/s $\color{#d91a1a}-4.29\%$
test_setitem_dim[range] 0.1091ms 62.7405μs 15.9387 KOps/s 15.4653 KOps/s $\color{#35bf28}+3.06\%$
test_setitem_dim[tuple] 60.8110μs 40.7177μs 24.5593 KOps/s 24.9477 KOps/s $\color{#d91a1a}-1.56\%$
test_setitem 76.0320μs 17.0013μs 58.8189 KOps/s 65.7480 KOps/s $\textbf{\color{#d91a1a}-10.54\%}$
test_set 75.3820μs 16.4635μs 60.7405 KOps/s 67.5155 KOps/s $\textbf{\color{#d91a1a}-10.03\%}$
test_set_shared 3.0868ms 96.7571μs 10.3352 KOps/s 9.9131 KOps/s $\color{#35bf28}+4.26\%$
test_update 88.0220μs 19.2612μs 51.9178 KOps/s 62.0546 KOps/s $\textbf{\color{#d91a1a}-16.34\%}$
test_update_nested 95.1310μs 23.9470μs 41.7589 KOps/s 47.2324 KOps/s $\textbf{\color{#d91a1a}-11.59\%}$
test_update__nested 71.8220μs 22.5794μs 44.2881 KOps/s 45.1775 KOps/s $\color{#d91a1a}-1.97\%$
test_set_nested 94.3710μs 18.1981μs 54.9508 KOps/s 62.7756 KOps/s $\textbf{\color{#d91a1a}-12.46\%}$
test_set_nested_new 81.0620μs 20.8046μs 48.0663 KOps/s 54.4475 KOps/s $\textbf{\color{#d91a1a}-11.72\%}$
test_select 99.1120μs 34.1060μs 29.3203 KOps/s 30.9369 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_select_nested 70.9310μs 53.4647μs 18.7039 KOps/s 18.6314 KOps/s $\color{#35bf28}+0.39\%$
test_exclude_nested 94.0820μs 68.3613μs 14.6282 KOps/s 9.0119 KOps/s $\textbf{\color{#35bf28}+62.32\%}$
test_empty[True] 0.3732ms 0.2703ms 3.6993 KOps/s 2.8582 KOps/s $\textbf{\color{#35bf28}+29.43\%}$
test_empty[False] 3.0421μs 0.8537μs 1.1714 MOps/s 1.1499 MOps/s $\color{#35bf28}+1.87\%$
test_to 0.1009ms 75.4189μs 13.2593 KOps/s 13.2257 KOps/s $\color{#35bf28}+0.25\%$
test_to_nonblocking 87.8010μs 60.8810μs 16.4255 KOps/s 16.4796 KOps/s $\color{#d91a1a}-0.33\%$
test_unbind_speed 0.3030ms 0.2672ms 3.7423 KOps/s 3.7598 KOps/s $\color{#d91a1a}-0.46\%$
test_unbind_speed_stack0 0.3093ms 0.2648ms 3.7760 KOps/s 3.7765 KOps/s $\color{#d91a1a}-0.01\%$
test_unbind_speed_stack1 0.7110ms 0.6725ms 1.4870 KOps/s 1.2720 KOps/s $\textbf{\color{#35bf28}+16.91\%}$
test_split 86.8956ms 1.6337ms 612.1226 Ops/s 611.7728 Ops/s $\color{#35bf28}+0.06\%$
test_chunk 84.8357ms 1.6244ms 615.6265 Ops/s 613.4545 Ops/s $\color{#35bf28}+0.35\%$
test_creation[device0] 0.1249ms 58.3538μs 17.1369 KOps/s 17.7474 KOps/s $\color{#d91a1a}-3.44\%$
test_creation_from_tensor 0.1267ms 56.0169μs 17.8518 KOps/s 18.7027 KOps/s $\color{#d91a1a}-4.55\%$
test_add_one[memmap_tensor0] 86.3710μs 7.3831μs 135.4442 KOps/s 142.5487 KOps/s $\color{#d91a1a}-4.98\%$
test_contiguous[memmap_tensor0] 11.7110μs 0.6241μs 1.6024 MOps/s 1.4846 MOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_stack[memmap_tensor0] 36.8010μs 4.7164μs 212.0251 KOps/s 215.6746 KOps/s $\color{#d91a1a}-1.69\%$
test_memmaptd_index 1.1119ms 0.2882ms 3.4693 KOps/s 3.5072 KOps/s $\color{#d91a1a}-1.08\%$
test_memmaptd_index_astensor 0.6444ms 0.3564ms 2.8055 KOps/s 2.7967 KOps/s $\color{#35bf28}+0.31\%$
test_memmaptd_index_op 1.0255ms 0.6905ms 1.4483 KOps/s 1.6165 KOps/s $\textbf{\color{#d91a1a}-10.40\%}$
test_serialize_model 0.1931s 0.1122s 8.9096 Ops/s 8.6867 Ops/s $\color{#35bf28}+2.57\%$
test_serialize_model_pickle 1.3506s 1.2354s 0.8095 Ops/s 0.8083 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_weights 0.1891s 0.1101s 9.0787 Ops/s 8.8262 Ops/s $\color{#35bf28}+2.86\%$
test_serialize_weights_returnearly 0.2448s 97.7548ms 10.2297 Ops/s 10.6156 Ops/s $\color{#d91a1a}-3.64\%$
test_serialize_weights_pickle 1.3805s 1.2522s 0.7986 Ops/s 0.7989 Ops/s $\color{#d91a1a}-0.04\%$
test_reshape_pytree 59.9510μs 25.4479μs 39.2960 KOps/s 37.3536 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_reshape_td 54.7110μs 29.3470μs 34.0750 KOps/s 32.9111 KOps/s $\color{#35bf28}+3.54\%$
test_view_pytree 48.8310μs 25.2335μs 39.6299 KOps/s 38.7542 KOps/s $\color{#35bf28}+2.26\%$
test_view_td 65.9110μs 33.0422μs 30.2643 KOps/s 29.7415 KOps/s $\color{#35bf28}+1.76\%$
test_unbind_pytree 60.4510μs 31.6037μs 31.6419 KOps/s 31.4949 KOps/s $\color{#35bf28}+0.47\%$
test_unbind_td 0.4469ms 39.8041μs 25.1230 KOps/s 24.6439 KOps/s $\color{#35bf28}+1.94\%$
test_split_pytree 59.7030μs 33.6530μs 29.7150 KOps/s 27.4868 KOps/s $\textbf{\color{#35bf28}+8.11\%}$
test_split_td 0.2492ms 37.9331μs 26.3622 KOps/s 25.9181 KOps/s $\color{#35bf28}+1.71\%$
test_add_pytree 0.1903ms 40.9472μs 24.4217 KOps/s 26.2077 KOps/s $\textbf{\color{#d91a1a}-6.81\%}$
test_add_td 0.1995ms 51.5066μs 19.4150 KOps/s 22.1697 KOps/s $\textbf{\color{#d91a1a}-12.43\%}$
test_distributed 1.9274ms 88.7322μs 11.2699 KOps/s 11.2727 KOps/s $\color{#d91a1a}-0.02\%$
test_tdmodule 47.3800μs 13.3927μs 74.6674 KOps/s 71.6806 KOps/s $\color{#35bf28}+4.17\%$
test_tdmodule_dispatch 45.2010μs 27.7505μs 36.0354 KOps/s 36.3467 KOps/s $\color{#d91a1a}-0.86\%$
test_tdseq 30.4200μs 15.1085μs 66.1880 KOps/s 61.9801 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_tdseq_dispatch 47.9810μs 30.8487μs 32.4163 KOps/s 32.7493 KOps/s $\color{#d91a1a}-1.02\%$
test_instantiation_functorch 1.6425ms 1.5227ms 656.7258 Ops/s 660.4085 Ops/s $\color{#d91a1a}-0.56\%$
test_instantiation_td 1.5037ms 1.0443ms 957.5773 Ops/s 867.1305 Ops/s $\textbf{\color{#35bf28}+10.43\%}$
test_exec_functorch 0.1929ms 0.1536ms 6.5093 KOps/s 6.5609 KOps/s $\color{#d91a1a}-0.79\%$
test_exec_functional_call 0.1900ms 0.1416ms 7.0607 KOps/s 7.0605 KOps/s $+0.00\%$
test_exec_td 0.1923ms 0.1412ms 7.0829 KOps/s 7.0909 KOps/s $\color{#d91a1a}-0.11\%$
test_exec_td_decorator 0.3264ms 0.2198ms 4.5494 KOps/s 4.7351 KOps/s $\color{#d91a1a}-3.92\%$
test_vmap_mlp_speed[True-True] 1.0505ms 0.5992ms 1.6688 KOps/s 1.6405 KOps/s $\color{#35bf28}+1.72\%$
test_vmap_mlp_speed[True-False] 0.7614ms 0.6004ms 1.6655 KOps/s 1.6418 KOps/s $\color{#35bf28}+1.45\%$
test_vmap_mlp_speed[False-True] 0.6050ms 0.5363ms 1.8646 KOps/s 1.7914 KOps/s $\color{#35bf28}+4.09\%$
test_vmap_mlp_speed[False-False] 0.5923ms 0.5310ms 1.8833 KOps/s 1.7890 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_vmap_mlp_speed_decorator[True-True] 0.7455ms 0.6675ms 1.4982 KOps/s 1.4564 KOps/s $\color{#35bf28}+2.87\%$
test_vmap_mlp_speed_decorator[True-False] 0.7593ms 0.6712ms 1.4898 KOps/s 1.4480 KOps/s $\color{#35bf28}+2.88\%$
test_vmap_mlp_speed_decorator[False-True] 0.7696ms 0.5980ms 1.6721 KOps/s 1.6301 KOps/s $\color{#35bf28}+2.57\%$
test_vmap_mlp_speed_decorator[False-False] 0.7497ms 0.6008ms 1.6644 KOps/s 1.6657 KOps/s $\color{#d91a1a}-0.08\%$
test_vmap_transformer_speed[True-True] 8.3890ms 8.0166ms 124.7418 Ops/s 123.4568 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed[True-False] 8.2281ms 8.0067ms 124.8957 Ops/s 124.2517 Ops/s $\color{#35bf28}+0.52\%$
test_vmap_transformer_speed[False-True] 8.1026ms 7.9549ms 125.7095 Ops/s 125.5127 Ops/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed[False-False] 8.8096ms 7.9356ms 126.0146 Ops/s 126.2385 Ops/s $\color{#d91a1a}-0.18\%$
test_vmap_transformer_speed_decorator[True-True] 19.7115ms 19.5741ms 51.0880 Ops/s 51.5006 Ops/s $\color{#d91a1a}-0.80\%$
test_vmap_transformer_speed_decorator[True-False] 19.6626ms 19.5175ms 51.2361 Ops/s 51.5013 Ops/s $\color{#d91a1a}-0.51\%$
test_vmap_transformer_speed_decorator[False-True] 19.5896ms 19.4525ms 51.4073 Ops/s 51.7440 Ops/s $\color{#d91a1a}-0.65\%$
test_vmap_transformer_speed_decorator[False-False] 19.8407ms 19.4408ms 51.4382 Ops/s 47.4014 Ops/s $\textbf{\color{#35bf28}+8.52\%}$
test_to_module_speed[True] 1.7045ms 1.6031ms 623.7755 Ops/s 655.4987 Ops/s $\color{#d91a1a}-4.84\%$
test_to_module_speed[False] 1.7089ms 1.6047ms 623.1802 Ops/s 651.4367 Ops/s $\color{#d91a1a}-4.34\%$
test_tc_init 43.2110μs 26.4352μs 37.8283 KOps/s 20.5534 KOps/s $\textbf{\color{#35bf28}+84.05\%}$
test_tc_init_nested 85.2710μs 53.1763μs 18.8054 KOps/s 9.6094 KOps/s $\textbf{\color{#35bf28}+95.70\%}$
test_tc_first_layer_tensor 14.3300μs 0.7476μs 1.3376 MOps/s 220.4013 KOps/s $\textbf{\color{#35bf28}+506.88\%}$
test_tc_first_layer_nontensor 1.6725μs 0.6578μs 1.5203 MOps/s 219.9563 KOps/s $\textbf{\color{#35bf28}+591.20\%}$
test_tc_second_layer_tensor 1.3870μs 0.7137μs 1.4011 MOps/s 119.0046 KOps/s $\textbf{\color{#35bf28}+1077.33\%}$
test_tc_second_layer_nontensor 4.3502μs 0.9765μs 1.0241 MOps/s 114.8002 KOps/s $\textbf{\color{#35bf28}+792.08\%}$
test_unbind 95.5513ms 7.5917ms 131.7224 Ops/s 102.7390 Ops/s $\textbf{\color{#35bf28}+28.21\%}$
test_full_like 12.1830ms 11.4297ms 87.4917 Ops/s 85.4965 Ops/s $\color{#35bf28}+2.33\%$
test_zeros_like 8.0343ms 7.8430ms 127.5026 Ops/s 142.2836 Ops/s $\textbf{\color{#d91a1a}-10.39\%}$
test_ones_like 8.1541ms 7.8699ms 127.0663 Ops/s 142.1070 Ops/s $\textbf{\color{#d91a1a}-10.58\%}$
test_clone 9.7204ms 9.4910ms 105.3631 Ops/s 104.4180 Ops/s $\color{#35bf28}+0.91\%$
test_squeeze 51.6110μs 9.7683μs 102.3723 KOps/s 43.1811 KOps/s $\textbf{\color{#35bf28}+137.08\%}$
test_unsqueeze 0.1917ms 56.7205μs 17.6303 KOps/s 11.1491 KOps/s $\textbf{\color{#35bf28}+58.13\%}$
test_split 0.2291ms 91.9514μs 10.8753 KOps/s 6.6099 KOps/s $\textbf{\color{#35bf28}+64.53\%}$
test_permute 0.1834ms 0.1140ms 8.7713 KOps/s 5.9934 KOps/s $\textbf{\color{#35bf28}+46.35\%}$
test_stack 29.1505ms 27.6347ms 36.1863 Ops/s 35.9637 Ops/s $\color{#35bf28}+0.62\%$
test_cat 28.0752ms 27.6008ms 36.2308 Ops/s 36.0391 Ops/s $\color{#35bf28}+0.53\%$

@vmoens vmoens linked an issue May 27, 2024 that may be closed by this pull request
@vmoens vmoens added the enhancement New feature or request label May 27, 2024
return tuple(subk for k in key for subk in _unravel_key_to_tuple(k))


def _slice_indices(index: slice, len: int):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go into PyTorch core _dynamo/polyfill.py and then inline it in dynamo.

Comment on lines +728 to +731
if not torch.compiler.is_dynamo_compiling():
_tensordict = __dict__.get("_tensordict")
else:
_tensordict = self._tensordict
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should eventually try to land pytorch/pytorch#118995...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah
For context, since tensordict can be very recursive with a lot of attributes hidden behind some hacky checks within __getattr__ and similar, we sometimes hack our way through getting the __dict__ and directly gathering the variable we're looking for.

I guess that if you use compile this makes less sense so I'm happy to fall back on a regular getattr if that makes dynamo happy

vmoens added 3 commits May 31, 2024 16:25
# Conflicts:
#	tensordict/base.py
#	tensordict/tensorclass.py
if default not in (None, dataclasses.MISSING):
kwargs.setdefault(key, default)
else:
# TODO: Decide what to do here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano @anijain2305

Dynamo doesn't want us to iterate over self.__dataclass_fields__.items():

  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1855, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 739, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/misc.py", line 668, in call_function
    return self.obj.call_method(tx, self.name, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/misc.py", line 714, in call_method
    return super().call_method(tx, name, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/base.py", line 320, in call_method
    unimplemented(f"call_method {self} {name} {args} {kwargs}")
  File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 216, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: call_method GetAttrVariable(UserDefinedObjectVariable(MyClass), __dataclass_fields__) items [] {}

tc.__post_init__()
return tc
else:
# TODO: things that did NOT work: **tensordict, dict(tensordict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano @anijain2305

All of these (dict(tensordict) or **tensordict) are valid syntaxes but only dict(tensordict.items()) worked

@@ -1802,8 +1880,9 @@ def _unbind(self, dim: int):
Resulting tensorclass instances will share the storage of the initial tensorclass instance.

"""
# TODO: dynamo doesn't like copy, using dict instead
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano @anijain2305

Interestingly copy doesn't work but dict(another_dict) does

@lezcano
Copy link

lezcano commented Jun 1, 2024

Can you please open a tracking issue in pytorch/pytorch with all these?

@vmoens
Copy link
Contributor Author

vmoens commented Jun 28, 2024

PT tracking issue: pytorch/pytorch#129668

vmoens added 4 commits June 29, 2024 12:51
# Conflicts:
#	tensordict/_td.py
#	tensordict/_torch_func.py
#	tensordict/base.py
#	tensordict/tensorclass.py
#	tensordict/utils.py
# Conflicts:
#	tensordict/tensorclass.py
# Conflicts:
#	tensordict/_lazy.py
#	tensordict/nn/sequence.py
#	tensordict/tensorclass.py
#	tensordict/utils.py
@vmoens vmoens closed this Jul 16, 2024
@vmoens vmoens deleted the compile-compat branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Any plans to make it compatible with torch.jit?
3 participants